Linguistically Motivated Complementizer Choice in Surface Realization
نویسندگان
چکیده
This paper shows that using linguistically motivated features for English that-complementizer choice in an averaged perceptron model for classification can improve upon the prediction accuracy of a state-of-the-art realization ranking model. We report results on a binary classification task for predicting the presence/absence of a that-complementizer using features adapted from Jaeger’s (2010) investigation of the uniform information density principle in the context of that-mentioning. Our experiments confirm the efficacy of the features based on Jaeger’s work, including information density–based features. The experiments also show that the improvements in prediction accuracy apply to cases in which the presence of a that-complementizer arguably makes a substantial difference to fluency or intelligiblity. Our ultimate goal is to improve the performance of a ranking model for surface realization, and to this end we conclude with a discussion of how we plan to combine the local complementizer-choice features with those in the global ranking model.
منابع مشابه
A Flexible Shallow Approach To Text Generation
In order to support the efficient development of NL generation systems, two orthogonal methods are currently pursued with emphasis: (1) reusable, general, and linguistically motivated surface realization components, and (2) simple, task-oriented template-based techniques. In this paper we argue that, from an application-oriented perspective, the benefits of both are still limited. In order to i...
متن کاملLinguistically Annotated BTG for Statistical Machine Translation
Bracketing Transduction Grammar (BTG) is a natural choice for effective integration of desired linguistic knowledge into statistical machine translation (SMT). In this paper, we propose a Linguistically Annotated BTG (LABTG) for SMT. It conveys linguistic knowledge of source-side syntax structures to BTG hierarchical structures through linguistic annotation. From the linguistically annotated da...
متن کاملHedge Trimmer: A Parse-And-Trim Approach To Headline Generation
This paper presents Hedge Trimmer, a HEaDline GEneration system that creates a headline for a newspaper story using linguistically-motivated heuristics to guide the choice of a potential headline. We present feasibility tests used to establish the validity of an approach that constructs a headline by selecting words in order from a story. In addition, we describe experimental results that demon...
متن کاملThe Advantage of the Ungrammatical
Sentences with multiple complementizers like I told him that for sure that I would come often occur in speech and even in writing, although they are not generated by any formal grammar. Here we conducted an acceptability study and a self-paced reading experiment to test whether these 'Multiple That' constructions are acceptable, and whether they are motivated by processing difficulty. Results s...
متن کاملHybrid Selection of Language Model Training Data Using Linguistic Information and Perplexity
We explore the selection of training data for language models using perplexity. We introduce three novel models that make use of linguistic information and evaluate them on three different corpora and two languages. In four out of the six scenarios a linguistically motivated method outperforms the purely statistical state-of-theart approach. Finally, a method which combines surface forms and th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011